Supervised Learning Classifier System for Grid Data Mining
نویسندگان
چکیده
During the last decades, applications of data mining techniques have been receiving increasing attention from the researchers and computer professionals (J. Luo, et al, 2007). Data mining on localized computer environments no longer meets the demands of solving today’s complex scientific and industrial problems because of the huge amount of data that is stored in geographically distributed worldwide databases (M. F, Santos et al, 2009 J. Luo et al, 2007). The purpose of this work is to generate a global model from distributed data sets. We consider two platforms for distributed data mining: one is based on divers distributed sites and the other on a global site (central site). There are two main methods for solving the data mining challenge in the distributed data set. The first method is to collect all data from the different repositories and store it in one location and then apply data mining on the collected data in order to make the global model. The second method applies the data mining in each distributed location generating local models, then collects and merges those models as a way to make the global model. The first method is defined as Centralized Data Mining (CDM) and the second method as Distributed Data Mining (DDM). This paper compares the performance of these two methods on three different data sets: two synthetic data sets (Monk 3 and 11 multiplexer) and a real-world data set (Intensive Care Unit (ICU) data). Classification is one of the most popular data mining technologies that can be defined as a process of assigning a class label to a given problem, given a set of problems previously defined (A. Orriols, et al 2005). Considering the actual level of data distribution, classification becomes a challenging problem. Current advances in Grid technology make it a very proactive in developing distributed data mining environment on grid platform. In this work grid is designed in a parallel and distributed fashion. Supervised learning method is used for data mining in the distributed sites. Data mining is applied in every node in the grid environment. The main objective of this work is to induce a global model from the local learning models of the grid. Every node of the grid environment manages an independent supervised classifier system and such nodes transmit learning models to the central site for making global model. This global model can show complete knowledge of all nodes. The construction of the global model is based on already induced models from distributed sites. This paper presents different strategies for merging induced models from each
منابع مشابه
Supervised Learning Classifier Systems for Grid Data Mining
This paper explores parallel and distributed implementation of the Learning Classifier System (LCS) technology. Specifically, the adaptation of supervised LCS to the grid data mining requisites, using the agent paradigm, is studied. The paper also examines the competitive data mining model induction possibility with homogeneous and heterogeneous data. A distributed framework is proposed using t...
متن کاملA Grid Data Mining Architecture for Learning Classifier Systems
Recently, there is a growing interest among the researchers and software developers in exploring Learning Classifier System (LCS) implemented in parallel and distributed grid structure for data mining, due to its practical applications. The paper highlights the some aspects of the LCS and studying the competitive data mining model with homogeneous data. In order to establish more efficient dist...
متن کاملGrid Data Mining Strategies for Outcome Prediction in Distributed Intensive Care Units
Previous work developed to predict the outcome of patients in the context of intensive care units brought to the light some requirements like the need to deal with distributed data sources. Those data sources can be used to induce local prediction models, and those models can in turn be used to induce global models more accurate and more general than the local models. This chapter introduces a ...
متن کاملGrid Data Mining by eans of Learning Classifier Systems and Distributed Model Induction
This paper introduces a distributed data mining approach suited to grid computing environments based on a supervised learning classifier system. Different methods of merging data mining models generated at different distributed sites are explored. Centralized Data Mining (CDM) is a conventional method of data mining in distributed data. In CDM, data that is stored in distributed locations have ...
متن کاملSemi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk
This study explores a semi-supervised classification approach using random forest as a base classifier to classify the low-back disorders (LBDs) risk associated with the industrial jobs. Semi-supervised classification approach uses unlabeled data together with the small number of labelled data to create a better classifier. The results obtained by the proposed approach are compared with those o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011